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Abstract —We investigate the problem of estimating 
a given real symmetric signal matrix C from a noisy 
observation matrix M in the limit of large dimension. 
We consider the case where the noisy measurement M 
comes either from an arbitrary additive or multiplicative 
rotational Invariant perturbation. We establish, using the 
Replica method, the asymptotic global law estimate for 
three general classes of noisy matrices, significantly ex¬ 
tending previously obtained results. We give exact results 
concerning the asymptotic deviations (called overlaps) of 
the perturbed eigenvectors away from the true ones, and 
we explain how to use these overlaps to “clean” the noisy 
eigenvalues of M. We provide some numerical checks for 
the different estimators proposed in this paper and we 
also make the connexion with some well known results 
of Bayesian statistics. 

1. Introduction 

One of the most challenging problem in modern 
statistical analysis is to extract a true signal from noisy 
observations in data sets of very large dimensionality. 
Be it in physics, genomics, engineering or finance, sci¬ 
entists are confronted with datasets where the sample 
size T and the number of variables N are both very 
large, but with an observation ratio q = N/T that is not 
small compared to unity. This setting is known in the 
literature as the high-dimensional limit and differs from 
the traditional large T, fixed N situation (i.e. q —> 0), 
meaning that classical results of multivariate statistics 
do not necessarily apply. 

However, when one deals with very large random 
matrices (such as covariance matrices), one expects 
the spectral measure of the matrix under scrutiny to 
exhibit some universal properties, which are indepen¬ 
dent of the specific realization of the matrix itself. 
This property is at the core of Random Matrix Theory 
(RMT), which provides a very precise description of 


the convergence of the spectral measure for a very 
large class of random matrices. Perhaps the two most 
influential results are Wigner’s semicircle law [1] and 
Marcenko and Pastur’s theorem [2]. As far as infer¬ 
ence is concerned, the latter result is arguably the 
cornerstone result of RMT in the sense that it gives 
theoretical tools to understand why classical estimators 
are insufficient and is now at the heart of many 
applications in this field (for reviews, see e.g. [3], [4], 
[5], [6] or more recently [7] and references therein). 

In this paper, we consider the statistical problem of 
a N X N matrix C which stands for the unknown 
signal that one would like to estimate from the noisy 
measurement of a A x A matrix M in the limit of large 
dimension A —> oo. A natural question in statistics is 
to find an estimator H(M) of the true signal C that 
depends on the dataset M we have. The true matrix C 
is unknown and we do not have any particular insights 
on its components (the eigenvectors). Therefore we 
would like our estimator H(M) to be constructed in a 
rotationally invariant way from the noisy observation 
M that we have. In simple terms, this means that 
there is no privileged direction in the A-dimensional 
space that would allow one to bias the eigenvectors of 
the estimator H(M) in some special directions. More 
formally, the estimator construction must obey: 

(I.l) 

for any rotation matrix fJ. Any estimator satisfying 
Eq. (I.l) will be referred to as a Rotational Invariant 
Estimator (RIE). In this case, it turns out that the 
eigenvectors of the estimator H(M) have to be the 
same as those of the noisy matrix M [8], [9]. As 
we will show in Section II, this implies that the best 
possible estimator H(M) depends on the overlaps (i.e. 
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the squared scalar product) between the eigenvectors 
of C and those of M. These overlaps turn out to 
be fully computable in the large N limit, using tools 
from Random Matrix Theory, for a wide class of noise 
sources, much beyond the usual Gaussian models. 

The study of the eigenvectors for statistical purposes 
is in fact quite a recent topic in random matrices. 
For sample covariance matrices, such considerations 
have been studied in [10] and [11]. In the latter paper, 
the notion of overlap and optimal (oracle) estimator 
are treated in great details. As far as we know, this 
is the only paper in the literature where the oracle 
estimator is related to random matrices. Beside the 
sample covariance matrix, the problem of the overlap 
for a Gaussian matrix with an external source (also 
named as deformed Wigner ensemble) has been treated 
first in [12] and then reconsidered in a more general 
setting in [13] using Dyson Brownian motions (for an 
early - but extremely brief - mention of these overlaps, 
see [14, Section 6.2]). However, no mention on how to 
clean the ‘noisy’ matrix was given in [12], [13]. It is the 
gap we hope to fill here. We also extend these results to 
a much broader class of random perturbations, which, 
to the best of our knowledge, was not considered 
before. 

The outline of this paper is organized as follows. We 
introduce in Section II-A some notations and show that 
the optimal (oracle) RI estimator involves the overlaps 
between the eigenvectors of the signal matrix C and 
its noisy estimate M. In Section II-B, we observe that 
a convergence result on the resolvent of M not only 
gives us all the information about the eigenvalues, 
but also the eigenvectors. After motivating the study 
of the resolvent of the measurement matrix M, we 
provide in Section III explicit expressions for three 
different perturbation processes. The first one is the 
case where we add a noisy matrix that is free with 
respect to the signal C. The second model concerns 
multiplicative perturbations and includes the sample 
covariance matrix of (elliptically distributed) random 
variables. We also reconsider the case of the so-called 
‘Information-Plus-Noise’ matrix that deals with sam¬ 
ple covariance matrices constructed from rectangular 
Gaussian matrices with an external source. The evalu¬ 
ation of the resolvent for each model is based on the 
powerful but non-rigorous replica method (which has 
been extremely successful in various contexts, includ¬ 
ing RMT or disordered systems- see [15], or [16] for 


a more recent review). We will see that the derivation 
of our results using replicas can be done without too 
much effort and one can certainly imagine that our 
results can be proven rigorously, as was done in [17] 
for the resolvent or [11], [12], [13] for the overlaps 
of covariance matrices and Gaussian matrices with 
external sources. We relegate all these technicalities 
in various appendices and only give our final results 
and their numerical verifications in Section III. Note in 
passing that we obtain using replicas the multiplication 
law of the 5-transform for product of free matrices (see 
Appendix B-C), a derivation that we have not seen 
in the literature before. In Section IV, we come back 
to the problem of statistical inference and apply the 
results of Section III to derive the optimal RIE for 
each considered model. In the multiplicative case, we 
recover and generalize the estimator recently derived 
by Ledoit and Peche for covariance matrices [11]. Each 
estimator is illustrated by numerical simulations, and 
we also provide some analytical formulas that can 
be of particular interest for real life problems. We 
then conclude this work with some open problems and 
possible applications of our results. 

Conventions. We use bold capital letters for matri¬ 
ces and bold lowercase letters for vectors. We denote 
usual RMT spectral transforms with calligraphic font. 
Einally, all acronyms and notations are summarized in 
Appendix C. 

II. Rotationally Invariant Estimators, 
Eigenvector Overlaps and the Resolvent 

A. The oracle estimator and the overlaps 

Throughout this work, we will consider the signal 
matrix C to be a symmetric matrix of dimension N 
with N that goes to infinity. We denote by ci ^ C 2 ^ 
... ^ cjv its eigenvalues and by |vi), |v 2 ),..., |vjv) 
their corresponding eigenvectors. The perturbed matrix 
M will be assumed to be symmetric with eigenvalues 
denoted byAi ^ A 2 ^ ... ^ Xn associated to the 
eigenvectors |ui), |u 2 ),..., \uj^). In the limit of large 
dimension, it is often more convenient to index the 
eigenvectors of both matrices by their corresponding 
eigenvalues, i.e. ju^) —> [uaJ and |vi) —)■ |vcj for 
any integer 1 ^ i ^ N, and this is the convention that 
we adopt henceforth. 

We now attempt to construct an optimal estimator 
H(M) of the true signal C that relies on the given 
dataset M at our disposal. We recall that our main 
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assumption is that we have no prior insights on the 
eigenvectors of the matrix C so that all estimators 
considered below satisfy Eq. (I.l). We conclude from 
the seminal work of [9] that any Rotational Invariant 
Estimator H(M) of C shares the same eigenbasis as 
M, that is to say 

N 

H(M) = ^e.|w)(w|, (II.l) 

where the eigenvalues ,..., are the quantities we 
wish to estimate. Next, the optimality of an estimator 
is defined with respect to a specific loss function (e.g. 
the distance) and a standard metric is to consider the 
(squared) Euclidean (or Erobenius) norm that we shall 
denote by 

||C-H(M)||j^^ := Tr [(C - H(M))2] , 

for a given RIE H(M). The best estimator with respect 
to this loss function is the solution of the following 
minimization problem 

5 (M) = argmin 11C - H (M) 11 (II.2) 

RIH(M) ^ 

considered over the set of all possible RI estimators 
H(M). Since the only free variables in the constrained 
optimization problem (II.2) are the eigenvalues of 
H(M), it is easy to find the optimal solution; 


substitution” procedure independently proposed in [5], 
[18], aside from the trivial case C = In- 
We shall also see that the estimator is self¬ 
averaging in the large A^-limit (in the sense that it 
converges almost surely, see Section II-B) and can thus 
be approximated by its expectation value 

N 

[(u,|vj)2] cj. 

i=i 

where E [•] denotes the expectation value with respect 
to the random eigenvectors (|ui))i of the matrix M. We 
will often use the following notation for the (rescaled) 
mean square overlaps 

0{X„Cj) := iVE[(u,|vj)2] . (II.4) 

Eqs. (II.3) and (II.4) are the quantities of interest in 
this paper. In Statistics, the optimal solution (II.3) 
is sometimes called the oracle estimator because it 
depends explicitly on the knowledge of the true signal 
C. The “miracle” is that in the large N limit, and for a 
large class of problems, one can actually express this 
oracle estimator in terms of the (observable) limiting 
spectral density (LSD) of M only. 

B. Relation between the resolvent and the overlaps 


N N 

“(M) = Iw) (Uil , S = , 

i=i 

(II.3) 

where we see that the optimal eigenvalues depend 
on the overlaps between the perturbed ju^), the non- 
perturbed eigenvectors |vj ) and the eigenvalues of the 
true matrix C. A few comments on this estimator 
are in order. Eirst, the estimator is designed to 
construct the best RI estimator H(M) given in (II.l). 
The consequence is that if we restrict our estimator 
to have the eigenvectors of the noisy matrix M, then 
the naive approach that consists in substituting^ the 
eigenvalues with the true ones {ci}fLi yields 

to a spectrum that is too wide. Indeed, it is not hard 
to see from (II.3) that the top eigenvalues are shrunk 
downward while the bottom ones are shrunk upward. 
In other words, the empirical spectral density (ESD) of 
the is narrower than the true one which shows that 
the RI estimator cannot be attained by the “eigenvalues 

'Remember that we have ranked the eigenvalues. 


A convenient way to work out the overlap (II.4) is 
to study the resolvent of M, defined as 

Gm(z) := (zliv-M)'i. (II.5) 


The claim is that for 2 ; not too close to the real axis, 
the matrix Gm(z) is self-averaging in the large N 
limit so that its value is independent of the specific 
realization of M. More precisely, it means that Gm(^) 
converges to a deterministic matrix for any fixed value 
(i.e. independent of N) of 2 : € C \ M when N ^ 00 . 
We will refer to this deterministic limit as the global 
law of Gm( 2 :) in the following. 

The relation between the resolvent and the overlaps 
0{Xj, Ci) is relatively straightforward. Eor z = X — it] 
with A G K and 77 N~^, we have 


Gm(A — irj) 


E 


A -f i ?7 

(A-Afc)2+r72 


|ufe) (Ufel . 


If we take the trace of the above quantity, and take the 
limit r] ^ 0 (after N —> 00 ), one obtains the limiting 
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“density of states” (i.e. the LSD) pM (see Appendix 
A): 

ImflM(A-i?7) = Im^Tr[GM(A-iry)] =7rpM(A). 

(11.6) 

Similarly , the elements of ImGM(A — ir/) can be 
written for 77 > 0 as 

N 

(v,|lmGM(A-ip)|v,) = 

(11.7) 

This latter quantity is also self-averaging in the large 
N limit (the overlaps (vi|u/j)^,fc = with i 

fixed display asymptotic independence when N —>■ 00 
so that the law of large number applies here) and we 
have 

(vi|lmG m(A - i? 7 )|vi) 

/ TT-— ^0{n,c^)pM{fJ.)dn. 

Jg (A — 

where the overlap function 0{p,,Ci) is extended (con¬ 
tinuously) to arbitrary values of p inside the support of 
Pm in the large N limit. Sending p —> 0 in this latter 
equation, we finally obtain the following formula valid 
in the large N limit 

(vi|lmGM(A - i? 7 )|vi) « 7rpM(A)0(A, q). (II. 8 ) 

Eq. (II. 8 ) will thus enable us to investigate the overlaps 
0(A, Ci) in great details through the calculation of the 
elements of the resolvent Gm( 2 )- This is what we aim 
for in the next section. We emphasize that the different 
equations of the mean square overlaps 0(A, c^) below 
will be expressed in the basis where C is diagonal 
without loss of generality (see Appendix B for more 
details). 

III. Overlaps: Some Exact Results 
A. Free additive noise 

The first model of noisy measurement that we con¬ 
sider is the case where the true signal C is corrupted 
by a free additive noise, that is to say 

M = C-fOBO^ (III.l) 

where B is a fixed matrix with eigenvalues 61 > 62 > 

■ ■ ■ > bjs! with limiting spectral density pB and O is 
a random matrix chosen uniformly in the Orthogonal 
group 0{N) (i.e. according to the Haar measure). This 
family of models has found several applications in 


statistical physics of disordered systems subject to an 
external perturbation where the matrix M is interpreted 
as the Hamiltonian of the system, given by the sum of a 
deterministic term and a random term [19]. A simple 
example is when the noisy matrix OBO^ is a sym¬ 
metric Gaussian random matrix with independent and 
identically distributed (i.i.d.) entries, corresponding to 
the so-called GOE (Gaussian Orthogonal Ensemble). 
By construction, the eigenvectors of a GOE matrix are 
invariant under rotation. 

It is now well known that the spectral density of 
M can be obtained from that of C and B using free 
addition, see [ 20 ] and, in the language of statistical 
physics, [21]. The statistics of the eigenvalues of M 
has therefore been investigated in great details, see 
[22] and [23] for instance. However, the question of 
the eigenvectors has been much less studied, except 
recently in [12], [13] in the special case where OBO^ 
belongs to the GOE (see below). 

Eor a general free additive noise, we show in 
Appendix B-2 that the global law estimate for the 
resolvent reads in the large N limit: 

(Gm( 2 )) = Gc{Z{z)) (III.2) 

where the function Z{z) is given by 

Z(^) = z-7^B(0M(2)), (III.3) 

and T^b is the so-called 7^-transform of B (see Ap¬ 
pendix A for a reminder of the definition of the 
different useful spectral transforms). 

Note that Eq. (III.2) is a matrix relation, that sim¬ 
plifies when written in the basis where C is diagonal, 
since in this case Gc(Z) is also diagonal. Therefore, 
the evaluation of the overlap 0{X,c) is straightforward 
using Eq. (II. 8 ). Let us define the Hilbert transform 
f)M(A) which is simply the real part of the Stieltjes 
transform gM(A — ir/) in the limit p —> 0. Then the 
overlap for the free additive noise is given by: 

^ _ 

^ (A - c - ai(A))2 + 7T^(3i{XypM{Xy ’ 

(111.4) 

where c is the corresponding eigenvalue of the unper¬ 
turbed matrix C, and where we defined: 

q;i(A) := Re [7^B (flM(A) -f i 7 rpM(A))], 

Im [7^B (f)M(A)-f i 7 rpM(A))] (HI.S) 

-- 

As a first check of these results, let us consider 
the normalized trace of Eq. (IIL2) and then set u = 
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Fig. 1. Computations of the rescaled overlap 0(A, c) as a function 
of c in the free addition perturbation. We chose i = 250, C a Wishart 
matrix with parameter g = 0.5 and B a Wigner matrix with cr^ = 1. 
The black dotted points are computed using numerical simulations 
and the plain red curve is the theoretical predictions Eq. (III.4). The 
agreement is excellent. For i = 250, we have p:: 0.83 and we see 
that the peak of the curve is in that region. The same observation 
holds for i = 400 where Cj ^ 1.66. The numerical curves display 
the empirical mean values of the overlaps over 1000 samples of M 
given by Eq. (III.l) with C fixed. 


0 m('Z) = Qc{Z{z)). One can find by using the Blue 
transform, defined in (A. 3), that we indeed retrieve the 
free addition formula TZm{u) — 'Ti-c{u) +TZ-b{u) when 
N —> oo, as it should be. 

Deformed GOE: As a second verification, we spe¬ 
cialize our result to the case where OBO^ is a GOE 
matrix such that the entries have a variance equal to 
a'^/N. Then, one has TIb{z) = a'^z meaning that Eq. 
(111.3) simply becomes Z{z) = z — a^QM{z)- This 
allows us to get a simpler expression for the overlap: 


(c - A + (T2f,jyj(A))2 + cr 47 r 2 pM(A )2 ’ 

( 111 . 6 ) 

which is exactly the result derived in [12], [13] using 
other methods. In Eig. 1, we illustrate this formula 
in the case where C is an isotropic Wishart matrix 
of parameter q, by taking e.g. C = where 

H is a symmetric matrix of size N x T filled with 
i.i.d. standard Gaussian entries and q = N/T. We set 
N = 500, T = 1000, and take OBO^ as a GOE 
matrix with variance 1/A^. Eor a fixed C, we generate 
1000 samples of M given by Eq. (III.l) for which we 
can measure numerically the overlap quantity. We see 
that the theoretical prediction ( 111 . 6 ) agrees remarkably 
with the numerical simulations. 


B. Free multiplicative noise and empirical covariance 
matrices 

Our second model deals with multiplicative noise 
in the following sense: we consider that the noisy 
measurement matrix M can be written as 

M = VCOBO^yC, (III.7) 

where again C is the signal, B is a fixed matrix with 
eigenvalues bi > b 2 > ■ ■ ■ > bj^ with limiting density 
pR and O is a random matrix chosen in the Orthogonal 
group 0{N) according to the Haar measure. Note that 
we implicitly requires that C is positive definite with 
Eq. (III.7), so that the square root of C is well defined. 

An explicit example of such a problem is provided 
by sample covariance matrices where B is a Wishart 
matrix [24]), which is of particular interest in multi¬ 
variate statistical analysis. We shall come back later 
to this application. The Replica analysis leads to the 
following systems of equations (see Appendix B-C) 
for the general problem of a free multiplicative noise 
above, Eq. (III.7): 

z(Gm(z)) = Z{z)Gc{Z{z)), (III. 8 ) 

with: 

Ziz) = zSRizQMiz) - 1), (III.9) 

where 5 b is the so-called 5-transform of B (see 
Appendix A) and gM is the normalized trace of Gm(- 2 )- 
The latter obeys, from Eq. (III. 8 ), the self-consistent 
equation: 

zgM(z) = Z{z)QciZiz)). (III.IO) 

Again, Eq. (111.8) is a matrix relation, that simplifies 
when written in the basis where C is diagonal. Note 
that Eqs (III.IO) and (III.9) allow us to retrieve the 
usual free multiplicative convolution, that is to say: 

5m(u) =5c(u)5b(m). (III.l 1) 

This result is thus the analog of our result (III.2) in the 
multiplicative case. We refer the reader to the appendix 
B-C for more details. We emphasize that for technical 
reasons, we restrict B to have a normalized trace that 
differs from zero. 

With the global law estimate for the resolvent given 
by Eqs. (III. 8 ) and (III.9) above, we can obtain a 
general overlap formula for the free multiplicative 
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noise case. Let us define the following functions 

1 



lim Re 
A—i0+ 


lim Im 
2—>-A—i0+ 


L^b(^5m(^) - 1) 

1 

5B(zgM(^) - 1), 


1 


7rpM(A)’ 

(III. 12) 

then the overlap 0(A,c) between the eigenvectors of 
C and M are given by: 

^ ^_ c/32 (A) _ 

(A - ca2(A))2 + 7r2c2/32(A)2pM(A)2 ■ 

(III. 13) 

In order to give more insights on our results, we 
will now specify these results to some well-known 
applications of multiplicative models in RMT. 

Empirical covariance matrix: As mentioned pre¬ 
viously, the most famous application of a model of 
the form (III.7) is given by the sample covariance 
estimator that we recall briefly. Let us define by 
R := {Ru) G the observation matrix whose 

columns represent the collected samples of size T 
that we assume to be independently and identically 
distributed with zero mean. The N elements of each 
sample generally display some degree of interdepen¬ 
dence, that is often represented by the true (or also 
population) covariance matrix C := (Cij) G 
defined as {RuRjt') = where St,t’ is the 

Kronecker symbol. As the signal C is unknown, the 
classical way to estimate the covariances is to compute 
the empirical (or sample) covariance matrix thanks to 
Pearson estimator 

M := ^RRI'= v^XX^'v^, (111.14) 

where Xis a N xT random matrix where all elements 
are i.i.d. random variables with zero mean and variance 
T~^. It is easy to see that this model is a particular 
case of the model (III.7) where B = XX^^ whose 5- 
transform has an explicit expression [3]: 


5b (x) = - 


1 


N 


q = 


(111.15) 


qx ^ T 

Using our general results Eqs. (III.8) and (III.9), we 
obtain 


z{Gm{z)) = Z{z)Gc{Z{z)), 

l-q+qzguiz) 


(111.16) 


which is exactly the result found in [25] and also 
in [17] at leading order. We can therefore recover 
the well-known Marcenko-Pastur equation [2] which 
gives a fixed point equation satisfied by the Stieltjes 


transform of M in term of the Stieltjes transform of 
the true matrix C 


zqm{z) = Z{z)qc{Z{z)) , 

l-q+qzguiz) 


(111.17) 


The expression of the limiting overlaps can be further 
simplified in this particular case to 


0(A,c) = 

qcX 

(c(l - q) - X + gcA[)M(A))2 -I- g 2 A 2 c 27 r 2 pM(A )2 

(111.18) 

and we recover, as expected, the Ledoit & Peche 
result established in [11]. As a conclusion, our result 
generalizes the standard Marcenko & Pastur formalism 
to an arbitrary multiplicative noise term OBO^. 

Elliptical ensemble: A slightly more general ap¬ 
plication of the model (III.7) is when we suppose 
that the entries of the observation matrix R can be 
written as the product of two independent sources 
Rit = cTtYit. The {Yit} are characterized by the true 
signal, i.e. Cij = (YitYjt')5t^t' and are generated 
independently from the same distribution at time t that 
will be assumed to be Gaussian in our case. The {at} 
are such that (cr^) = 1 and allows to add a time- 
dependent volatility with a factor at that is common to 
all variables at time t. This defines the class of elliptical 
distributions and the most famous application is when 
the {at} are drawn from a inverse-gamma distribution 
which leads to the multivariate Student distribution 
[26] (see Sec. (IV-B) below). The corresponding em¬ 
pirical correlation matrix can be written as 

M = a/CXSX^ VC, (III. 19) 

where E := diag((jj, ..., and X is defined 

as in Eq. (III. 14). This model has been subject to 
several studies in RMT, see e.g. [27] [28], [29] or 
[30]. In all these works, the expression of the limiting 
Stieltjes transform of the spectral density is quite 
complex, except for the case where C is the identity 
matrix. We find here that we can in fact obtain a 
self-consistent expression for the global law estimate 
of the corresponding resolvent by introducing the ap¬ 
propriate transforms. Our result generalizes the time- 
independent result of [25] or [17], and also provides a 
tractable equation for the limiting eigenvalues density. 

Before stating the result for the elliptical model 
(111.19), one has to be careful with the 5-transform 
of B. Indeed, it is in fact more convenient to work 
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with the “dual” matrix B* := v^X^CXy^ in order 
to use the free multiplication formula. We then obtain 
the 5-transform of B* to finally express the Stieltjes 
transform of pb, as a function gs, simply by noticing 
that B* has the same eigenvalues as B and the ad¬ 
ditional zero eigenvalue with multiplicity T — N. The 
hnal result reads, after elementary manipulations of the 
T -transform, 

5b.(x) = (111.20) 

x + q \qj 

In a nutshell, applying the result Eq. (III.9) to the 
elliptical case leads to the result 

Z{z) = - --—5s(g(zgM(z)-l)) (III.21) 

1- q + qzQMiz) 

with Eqs. (III.8) and (III. 10) unchanged. With Eq. 
(III.21), the general result (III. 10) extends the stan¬ 
dard Marcenko-Pastur formula to a time-dependent^ 
framework. Note that the corresponding self-consistent 
equation for the Stieltjes transform gM(^) has been 
obtained in previous studies [28], [29], [30] yet stated 
in a different form. One can easily specialize the result 
of the overlaps 0(A, c) to any A and c as a function 
of the spectral measure of E. However, we do not hnd 
an expression as tractable as Eq. (111.18). 

Einally, let us now show that Eq. (III. 10) can be 
useful for practical purposes in order to construct non¬ 
trivial models. Suppose that C is an inverse-Wishart 
matrix (see Section IV.B. for the dehnition of this law) 
with parameter k = 0.2 and dehne E to be a Wishart 
matrix of size TxT and parameter qo = 0.6. We follow 
the same numerical procedure as in the free additive 
noise case. We compare in Eig. 2 our theoretical 
result Eq. (III. 10) with empirical simulations and the 
agreement is remarkable. The same conclusion holds 
for the overlap (see Eig. 3). 

Note that the results obtained in this section could 
be extended to the case where the diagonal matrix E 
is not positive definite, for applications in regression 
analysis (see for example [31]). It can also be used 
for studying Maronna’s robust estimators of C as the 
deterministic equivalent of such estimators in the large 
N falls down into the model (III.7) [32]. 

C. Infonnation-Plus-Noise matrix 

The last model we will treat here is the so-called 
Tnformation-Plus-Noise’ family of matrix [33]. In this 

^in the sense that the volatility depends on the observation time 
t. 



X 

Fig. 2. Theoretical predictions of the density of states from Eq. 
(III.8) (red line) compared to simulated data when C is a 500 X 500 
inverse-Wishart matrix (parameter k = 0.2) and S is a white Wishart 
with 50 = 0.6. The agreement of our theoretical estimate is excellent 
and differs strongly from the classical Marcenko-Pastur density (blue 
dotted curve) 



Fig. 3. Rescaled overlap 0(A,c) as a function of Cj in the free 
multiplicative perturbation with N = 500. We chose C as an 
inverse-Wishart matrix with parameter k = 0.2 and S a Wishart 
matrix with 50 = 0.6. The black dotted points are computed 
using numerical simulations and the plain curves are the theoretical 
predictions Eq. (III. 13). For i = 250 (resp. i = 400), we have 
Ci Ri 0.37 (resp. Ci ^ 1.48) and we see that the peak of the curve 
is in that region for both value of i. 

model, we suppose that at each time t, we observe a N- 
dimensional vector whose entries are given by Rn := 
Ait + o'Xit for any i = 1,... ,N where the signal is 
contained in the variable An which is perturbed by 
an additive noise a-Xn G We will assume that 
the entries of X = (Xn) € are i.i.d. Gaussian 

random variables with zero mean and unit variance. In 
the case where the number of samples T ^ N, the 
empirical covariance matrix given by 

M = ;^RR^ = yA -b crX) (A -b crX)1' (III.22) 



















is a good estimator of + ct^Iat. This model 

is of particular interest in signal processing, in order 
to detect the number of sources and their direction of 
arrival [34]. Another example of application of this 
model comes from Finance where one may want to 
estimate the integrated covariance matrix from high- 
frequency noisy observation where the matrix X plays 
the role of the microstructure noise [35]. 

As usual, in the case where T ~ 0{N), the 
empirical estimator cannot be fully trusted. The main 
assumption of the model is again the convergence of 
the empirical density of eigenvalues of C := T~^AA^ 
towards a limiting density pc- The global law of the 
Information-Plus-Noise matrix reads, in a matrix sense: 

(Gm(^)) = {{zZ{z) - a2(l - q)) - Z{z)-^C )-^, 

(111.23) 

where we have defined 

Ziz) = l-qa^SM{z). (111.24) 


This global law result (III.23) has already been ob¬ 
tained in a mathematical context by the authors of [36] 
for applications in wireless communications and signal 
processing. However, it is satisfactory to see that the 
replica method is able to reproduce this result. If we 
take the normalized trace of the above equation, we 
find that the Stieltjes transform reads 





_ dcpc(c) 

qcr'^miz:)) - 0 - 2(1 - q) 


_c_ ’ 


(III.25) 


which is the result obtained in [33]. As far as we 
understand, the authors of [36] did not discuss the 
overlaps in the present context. The final expression 
for 0(A, c) is quite cumbersome, but again completely 
explicit, and reads: 


0(A,c) 

Q!3(A) 

Xi(^) 

X2(A) 

C(A) 


ga2Q3(A)(AQ3(A) -fc) 
xf(A,c)+xi(A,c) ^ 

C^(A)+qWp^(A) 

(Aa 3 (A) - c)C(A) - Q: 3 (A)cr^(l - q) 
(XasiX) + c)qaTrpMiX) 

1 - gcr^()M(A). 


For the sake of completeness, we provide a numer¬ 
ical example for the overlap (III.26) where A is a 
Gaussian matrix of size N x T with q = 0.5 and 
N — 500 with variance 1. The perturbation is a 



Fig. 4. Rescaled overlap 0(A,c) as a function of Cj in the 
information-plus-noise model with N = 500. We chose A and X to 
be a A X T Gaussian matrix and T = 2N. The black dotted points 
are computed using numerical simulations and the plain curves are 
the theoretical predictions Eq. (III.26). For i = 250 (resp. i = 400), 
we have cj ^ 0.83 (resp. ^ 1.66) and we see that the peak of 
the curve is in that region for both value of i. 


Gaussian noise of same size with cr = 1. The procedure 
is the same than in the previous section and we give a 
numerical example in Fig. 4. 

IV. Optimal rotational invariant estimator 

The above resolvent and overlap formulas for vari¬ 
ous models of random matrices are the central results 
of this study. Equipped with these results, we can now 
tackle the problem of the optimal RIE of the signal C. 
As we shall see, the high-dimensional limit N —i' oo 
allows one to reach some degree of universality. Eirst, 
we rewrite the RIE ((II.3)) as: 

V / cpc(c)0(Ai,c)dc. 

->oo JV J 

Quite remarkably, as we show below, the optimal RIE 
can be expressed, in the three above cases, as a function 
of the spectral measure of the observable (noisy) M 
only. 

Let us however stress that the nonlinear “shrinkage” 
estimators we obtain below are a priori valid in 
the support of M only. An interesting problem for 
future research would be to extend the results obtained 
here for the bulk eigenvalues to the spiked eigenvalues, 
also called outliers. Here, we assume that there are 
no spikes and we perform the optimal RIE for each 
models. 
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A. Free additive noise 

We now specialize the RIE and we begin with the 
free additive noise case for which the noisy measure¬ 
ment is given by 

M = C + OBO^ 


It is easy to see from Eqs. (II. 8 ) and (III.2) that: 


U = 


1 


lim Im 


cpcjc) 
Z{z) - c 


dc 


TTp^i^Xi) z^\i—iO+ 

——lim ImTr[GM(^)C] , 

NTTz^Xi—tO+ 


(IV. 1) 


where Z{z) is given by Eq. (III.3). Erom Eq. (III.2) 
one also has Tr[GM( 2 )C] = N{Z{z)gM{z) — 1 ), and 
using Eqs. (III.3) and (III.5), we end up with: 

lim ImTr [Gm( 2 :)C] 

z-).A-iO+ 

= NirpMiX) [X - a(X) - /^(X)Ijm(X)] . 

We therefore find the following optimal RIE nonlinear 
“shrinkage” function Fi: 

Ci = ^i(^) = X — ai(A) — /3 i(A)I)m(A), 

(IV.2) 

where ai, /?i are defined in Sect. III.A, Eq. (III.5). This 
result states that if we consider a model where the 
signal C is perturbed with an additive noise (that is 
free with respect to C), the optimal way to ’clean’ the 
eigenvalues of M in order to get S(M) is to keep the 
eigenvectors of M and apply the nonlinear shrinkage 
formula (IV.2). We see that the non-observable oracle 
estimator converges in the limit N —>■ oo towards a 
deterministic function of the observable eigenvalues. 

Deformed GOE: Let us consider the case where B 
is a GOE matrix. Using the definition of cti and /3i 
given in Eq. (III.5), the nonlinear shrinkage function 
is given by 

Ui(A) = A-2a2f)M(A). (IV.3) 

Moreover, suppose that C is also a GOE matrix so 
that M is a also a GOE matrix with variance cr^ = 
cr^ -I- (j^. As a consequence, the Hilbert transform of 
M can be computed straightforwardly from the Wigner 
semicircle law and we find 


The optimal cleaning scheme to apply in this case is 
then given by: 

= ■' 

where one can see that the optimal cleaning is given 
by rescaling the empirical eigenvalues by the signal-to- 
noise ratio. This result is expected in the sense that we 
perturb a Gaussian signal by adding a Gaussian noise. 
We know in this case that the optimal estimator of the 
signal is given, element by element, by the Wiener filter 
[37], and this is exactly the result that we have obtained 
with (IV.4). We can also notice that the ESD of the 
cleaned matrix is narrower than the true one. Indeed, 
let us define the signal-to-noise ratio SNR = G 

[0,1], and it is obvious from (IV.4) that S(M) is a 
Wigner matrix with variance (j| x SNR which leads to 

CTm > CTC > CTC X SNR, (IV.5) 

as it should be. 

As a second example, we now consider a less trivial 
case and suppose that C is a white Wishart matrix with 
parameter go- For 7 o > 0 ^ it is well known that the 
Wishart matrix has nonnegative eigenvalues. However, 
we expect that the noisy effect coming from the GOE 
matrix pushes some true eigenvalues towards the neg¬ 
ative side of the real axis. In Eig 5, we clearly observe 
this effect and a good cleaning scheme should bring 
these negative eigenvalues back to positive values. In 
order to use Eq. (IV.3), we invoke once again the free 
addition formula to find the following equation for the 
Stieltjes transform of M: 

0 = - qoCr^gM(z)^ + (cr^ + Qo z)gM(zf 
+ (l-qo- z)gM(z) + 1, 

for any z = X —ip with 77 —> 0. It then suffices to take 
the real part of the Stieltjes transform that solves 

this equation^ to get the Hilbert transform. In order to 
check formula Eq. (IV.2) using numerical simulations, 
we have generated a matrix of M given by Eq. (III.l) 
with C a fixed white Wishart matrix with parameter qg 
and OBO^ a GOE matrix with radius 1. As we know 
exactly C, we can compute numerically the oracle 
estimator as given in (II.3) for each sample. In Eig. 
6 , we see that our theoretical prediction in the large 
V limit compares very nicely with the mean values 


f)M(A) 


A 

2cr^ 


^We take the solution which has a strictly nonnegative imaginary 
part 
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Fig. 5. Eigenvalues of the noisy measurement M (black dotted line) 
compared to the true signal C drawn from a 500 x 500 Wishart 
matrix of parameter go = 0.5 (red line). We have comrpted the 
signal by adding a GOE matrix with radius 1. The eigenvalues 
density of M allows negative values while the true one has only 
positive values. The blue line is the LSD of the optimally cleaned 
matrix. We clearly notice that the cleaned eigenvalues are all positive 
and its spectrum is narrower than the true one, while preserving the 
trace. 



Fig. 6. Eigenvalues according to the optimal cleaning formula (IV.4) 
(red line) as a function of the observed noisy eigenvalues A. The 
pai'ameter are the same as in Fig. 5. We also provide a comparison 
against the naive eigenvalues substitution method (black line) and 
we see that the optimal cleaning scheme indeed narrows the spacing 
between eigenvalues. 


of the empirical oracle estimator computed from the 
sample. We can also notice in Fig. 5 that the spectrum 
of the cleaned matrix (represented by the BSD in blue) 
is narrower than the standard Marcenko-Pastur density. 
This confirms the observation made in Sec. II-A. 

B. Free multiplicative noise 

By proceeding in the same way as in the additive 
case, we can derive formally a nonlinear shrinkage 
estimator that depends on the observed eigenvalues A 


of M defined by 

M = vLobo^vL. 


Following the computations done above, we can find 
after some manipulations of the global law estimate 
(III. 8 ): 

Tr(GM(2:)C) = N{zqm{z) - l)5B(zflM(2:) - !)■ 

(IV. 6 ) 

Using the analyticity of the 5-transform, we define the 
function 73 and wb such that; 

lim 5 b(z0m(2) - 1) := 7 b(A) -f i 7 rpM(A)a;B(A) , 

z->'A-i0+ 

(IV.7) 

and as a result, the optimal RIB (or nonlinear shrinkage 
formula) for the free multiplicative noise model (III.7) 
reads: 


Ci = F2{\)', F2{\) = A7B(A)-l-(AhM(A) —l)a;B(A), 

(IV. 8 ) 

and this is the analog of the estimator (IV.2) in the 
multiplicative case. 

Empirical covariance matrix: As a first application 
of the general result Bq. (IV. 8 ), we reconsider the ho¬ 
mogeneous Marcenko-Pastur setting where B = XX^ 
with X defined as in Bq. (III. 14). We trivially find from 
the definition of the 5-transform (III. 15) that (IV.7) 
yields in this case; 


7b (A) 


wb(A) 


l-q + gAl)M(A) 

|l-q-fgA lim 0 m(- 2 :)P 

2—>-A—i0+ 

__ 

ll-g-fgA lim flM(^)P ’ 

2->-A-i0+ 

(IV.9) 


The nonlinear shrinkage function F 2 thus becomes: 


' (l-g + gAMA))2 + g%2^^(A)’ 

(IV. 10) 

which is precisely the Ledoit-Peche estimator derived 
in [11]. Let us insist once again on the fact that this is 
the oracle estimator, but it can be computed without the 
knowledge of C itself, but only with its noisy version 
M. This “miracle” is of course only possible thanks to 
the N ^ 00 limit that allows the spectral properties 
of M and C to become deterministically related one to 
the other. 

Like in the additive case, we can give a pretty 
insightful application of the formula (IV. 8 ) based on 
Bayesian statistics once again. Let us suppose that 
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C is a white inverse-Wishait matrix (i.e. is a 

white Wishart matrix of parameter q). The eigenvalue 
distribution of C can then be computed exactly by 
performing the following change of variable"^ ^ = 
{{1 — q)c)~^ in the Marcenko-Pastur density function 
to get 

Pc(m) = 

p± := -[k + 1± V2^^TT], (IV.ll) 

K 

with q = {2k + 1)“^ and k the hyper-parameter 
which is positive. From there, one can compute the 
corresponding Stieltjes transform of C 

(1 -f k)z - K±KyJ{z - P+)(Z - p_) 

0c(^) =--, 

(IV. 12) 

and we can also compute the Stieltjes transform of 
the perturbed matrix M thanks to the Marcenko-Pastur 
equation (III. 17); 


1piz,K) 


z{l + k) — k{1 — q)± ^/i’iz, k) 
z{z -I- 2qK) ’ 

(k(1 -q)- z{l + K)f 
-z{z + 2qK){2K + l) (IV.13) 


The reason why we insist on this matrix ensemble is 
that it plays a special role in multivariate statistics, 
especially for estimating covariance matrices because 
the famous linear shrinkage estimator [38] turns out to 
be exact in this case, in the sense that it corresponds to 
the RIE as defined in the introduction. We can recover 
this result within the present formalism. Indeed, the 
use of Eq. (IV.13) in the estimator (IV. 10) leads, after 
some computations, to: 

F 2 {\) = Q;A-|-(1 —a), with a := - ^ -. 

1 -I- 2qK 

(IV. 14) 

This is the linear shrinkage estimator that tells us to 
replace the noisy eigenvalues by a linear combination 
of the noisy eigenvalues and unity. 

Elliptical Ensemble: In this subsection, we now 
consider the elliptical model (III. 19) where the noise 
term is given by 

B = XEX^ 


for an arbitrary diagonal T x T matrix E. One 
immediately sees that the optimal shrinkage formula 


^The factor 1 — q is such that TrC = N, which follows from the 
Marcenko-Pastur equation. 



A 

Fig. 7. Eigenvalues according to the optimal cleaning formula 
(IV.8) (red line) as a function of the observed noisy eigenvalues A 
when C is an inverse-Wishart matrix (with parameter k = 0.2 and 
the S = diag({c7^}t) is distributed according the Marcenko-Pastur 
density (with parameter qq = 0.5). We compare the result against 
numerical simulations (blue points) and the agreement is excellent. 
We furthermore compute the optimal cleaning scheme when S = I-p 
(black dotted line) and we see that E allows one to go from linear 
to nonlinear shrinkage. 


(IV.8) will now depends on q = N/T and on the 
spectral measure of E, which prevents us to get a 
tractable form as in the homogeneous Marcenko- 
Pastur case (IV. 10). However, we may expect to find 
a nonlinear shrinkage formula even when the signal 
is given by an Inverse-Wishart matrix. Indeed, the 
optimal cleaning formula is given by Eq. (IV.8) where 
we can compute numerically the 5-transform of B 
using the free multiplication and Eq. (III.20) for any 
S. We illustrate this in Eig. (7) where the eigenvalues 
of E are generated following Marcenko-Pastur density 
and we see that the estimator (IV.8) clearly deviates 
from the linear shrinkage (IV. 14). 


As a second example, we consider the Student en¬ 
semble of correlation matrices [27], [30] which has en¬ 
countered some success in quantitative hnance because 
it allows one to construct non-Gaussian correlated data 
with a clear interpretation of the matrix E. We impose 
the to be distributed according to an inverse- 

gamma distribution, i.e. 






j-l+Ai 


(IV. 15) 


where we set Cg := (/i — 2)/2 and p > 2 in 
order to have (u^) = 1. Within such prescription, the 
sample data := cTtYu, with = C,j, is 

characterized by the multivariate Student distribution 
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Fig. 8. Eigenvalues according to the optimal cleaning formula 
(IV.8) as a function of the noisy observed eigenvalues A when C is 
an inverse-Wishart matrix (with parameter k, = 0.2) and the S = 
diag({(T^}t) is generated according an inverse-gamma distribution 
(IV.15) (with parameter fi = 6). We compare the RIE (IV.8) (red 
line) against numerical simulations (blue points) and the agreement 
is quite convincing, especially in the bulk. We compare it with the 
substitution procedure (black dotted line) which leads to a wider 
spectrum. 

of parameters p and N [26]. From a financial per¬ 
spective, this parametrization can be useful as a model 
where all individual stock returns are impacted by the 
same, time dependent scale factor at that represents 
the “market volatility” (see [39] for a discussion of 
this assumption). From empirical studies, one possible 
choice that matches quite well the data is to choose 
Eq. (IV. 15) with /x « 3 — 5. The results above allow 
us to compute numerically either the LSD or the RIE 
for an arbitrary “true” signal C, thus generalizing the 
work done in [27]. 

We plot in Eig. 8 the RIE (IV. 8 ) when the eigenval¬ 
ues of E are generated following the inverse-gamma 
distribution with /x = 6 and C is still an inverse- 
Wishart matrix of parameter k = 0.2. The numerical 
procedure is the same as for the previous example. The 
results we obtain are quite convincing, especially in the 
bulk. The noisy fluctuations for the largest eigenvalues 
in Eig. 8 can be explained by the difficulty to solve Eq. 
(III. 10) and (III.21) outside of the bulk, most notably 
due to the inversion of the T-transform of E. However, 
we see that these large eigenvalues still have the right 
behavior in the sense that they are shrunk downward 
compared to the “naive” substitution procedure. 

C. Information-Plus-Noise matrix 

The derivation of the asymptotic RI estimator for 
the Information-Plus-Noise model is a bit more tedious 



Fig. 9. Eigenvalues according to the optimal cleaning formula 
(IV. 16) as a function of the noisy observed eigenvalues A with the 
same setting than in Sec. (III-C). We compare the estimator (IV. 16) 

(red line) against numerical simulations coming from a single sample 
with N = 500 (blue points), and the agreement is excellent. 

compared to the previous cases but one can follow 
the same route to hnd the desired result. We leave the 
complete derivation to the reader; the hnal formula for 
the corresponding shrinkage function F 3 reads: 

^3(A) = C(A)(A-(T^(l-9)-2(3'cr2Af)M(A))-t-gcr^(l-7M(A)), 

(IV. 16) 

where 1)m(A) is as before the Hilbert transform of the 
probability density pM, C(A) is given in (III.26) and 
the function 7m is dehned by 

7m(A) = l)M(A)(A-cr^(l-(?))-fqCT2A(7rVM(A)-t}M(A))- 

If we consider the trivial case of zero noise (i.e. a = 0), 
we have by dehnition that M = C and we indeed see 
this in Eq. (IV. 16) where the optimal shrinkage formula 
becomes = A^. The other limit that can be studied 
without much effort is when the sample size becomes 
much larger than the number of variable (i.e. q = 0). 

In this case, we know that M = C -\- a'^l^ by the 
law of large number. The optimal shrinkage (IV. 16) 
gives in that case = \i — <t^ which was expected 
because the observation matrix M is simply a shift 
of the signal by a factor a^. Let us now reconsider 
the same numerical example of Section (III-C) and we 
apply the same procedure to test the RIE (IV. 16) than 
the last two sections. We clearly see in Eig. 9 that the 
agreement is remarkable even for a hnite N. 

V. Conclusion and open problems 

As we recalled in the introduction, RMT is already 
at the heart of many signihcant contributions when it 
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comes to reconstructing a true signal matrix C of large 
dimension from a noisy measurement. In this paper, we 
revisited this statistical problem and considered the so- 
called oracle estimator which is optimal with respect to 
the Euclidean norm. In particular, we have established 
the global resolvent law for three distinct ensembles 
of random matrices that embrace well-known models 
like the deformed Wigner or the sample covariance 
matrix. These results on the asymptotic convergence 
have two important applications: (i) they allow us to 
find exact results on the overlap between the eigen¬ 
vectors of the signal matrix with the corrupted ones; 
(ii) most importantly, they lead to the ‘miracle’ which 
allows the oracle estimator to be expressed without any 
knowledge of the signal C in the large N limit. This 
last observation, that generalizes the work of Ledoit 
and Peche [11], should be of particular interest in 
practical cases. 

We emphasize that the proposed estimators are opti¬ 
mal (in the norm sense) when the dimension of the 
problem becomes very large and under no particular 
prior beliefs on the eigenvector structure of the true 
matrix C. However, it happens in practice that one 
could have a prior structure on the eigenvectors of C 
(factor models), and it would be interesting to see how 
can we rewrite our problem in a non-RI framework. 
This natural extension is left for future work. Note that 
the case of so-called ‘spiked’ eigenvalues/eigenvectors 
in the context of empirical covariance matrix was 
recently solved in [40]. 

We also highlighted the interesting connexion be¬ 
tween our work with some famous result of Bayesian 
statistics. For instance, we found out that our results 
generalize the Wiener filter [37] (additive case) but also 
the linear shrinkage [38], [41] (multiplicative case), 
and both have encountered many successes in practical 
cases. Moreover, the Bayesian theory has found several 
applications in modern statistical analysis, especially 
because the large amount of data may allow one to 
identify a pattern in the data which could be used 
as a prior. The different estimators we proposed are 
powerful in the sense that they depend only on ob¬ 
servable quantities. Differently said, they can be used 
in many different contexts with no parameter to fit. 
This is in adequation with the recent work of [42] and 
[40] which provide efficient numerical methods to use 
these (limiting) results for real life applications, in the 
specific case of sample covariance matrices. One could 


also extend the above results to more realistic models 
of covariances that accounts for autocorrelation effects 
[28] or fat-tailed distributions [32]. 

Although our computations are based on the non¬ 
rigorous Replica method, the comparison between our 
theoretical formulas and empirical simulations demon¬ 
strates the robustness of each proposed estimator. 
Hence, one can certainly think of possible extensions 
of this work based on the same method. For instance, 
a natural extension for our free additive perturbation 
model would be given by 

M = C4-0,B0J (V.l) 

where the law of the matrix Oq G 0{N) interpolates 
between the Haar measure on the Orthogonal group 
0{N) when q = 0 and a given measure on the 
permutation group when q = -foo. Differently said, M 
interpolates between the free and the classical addition. 
A natural prescription would be to imagine that Oq is 
the result of Biane’s Brownian motion [43] on 0{N). 
Another natural possibility is to assume that Oq is dis¬ 
tributed according to the probability measure with the 
Harish-Chandra-Itzykson-Zuber (HCIZ) weight [44], 
[45]: 

Vq{dO) (X exp [gA^TrCOBO'^] dO (V.2) 

which has the right limits when g —> 0 (Haar measure 
on the orthogonal group) and q oo (determinis¬ 
tic measure rearranging the spectrum of B in non¬ 
increasing order). Hence, considering a replica method 
for this specific case might give us access to the global 
law of the resolvent of M which should enable us to 
express the correlation function of angular integrals, 
leading to an alternative expression of the Morosov- 
Shatashvili formula [46], [47] expressed in terms of 
the free energy of HCIZ integrals. 
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Appendix A 

Reminder on transforms in RMT 

We give in this first appendix a short reminder on 
the different transforms that are useful in the study of 
the statistics of eigenvalues in RMT due to their link 
with free probability theory (see e.g. [48] or [49] for 
a review). The resolvent of M is defined by: 

Gm(z) := (zIat-M)-!, (A.l) 

and the Stieltjes (or sometimes Cauchy) transform is 
the normalized trace of the Resolvent: 

flM(z) 


The Stieltjes transform can be interpreted as the aver¬ 
age law and is very convenient in order to describe the 
convergence of the eigenvalues density pM- If we set 
z = A — iry and take the limit 77 —> 0, we have in the 
large N limit 


—TrGM(z), 

1 ^ 1 

N Z — Xk N^oo 

k—1 


Pm (A) 
z — A 


dA. 


where the real part is often called the Hilbert transform 
f)M(A) and the imaginary part leads to the eigenvalues 
density. 

When we consider the case of adding two random 
matrices that are (asymptotically) free with each other, 
it is suitable to introduce the functional inverse of the 
Stieltjes transform known as the Blue transform [21] 

^M(flM(z)) = z. (A.3) 

This allows us to define the so-called 7^-transform 

7^M(^) := Bm{z) - (A.4) 

z 

which can be seen as the analogue in RMT of the 
logarithm of the Fourier transform for free additive 
convolution. More precisely, if A and B are two 
N X N independent invariant symmetric random ma¬ 
trices, then in the large N limit, the spectral measure 
of M = A -f B is given by 

7^M(z) = 7^A(z) -f 7^B(z), (A.5) 


known as the free addition formula [20]. In this case, 
we note by pahb the eigenvalues density of M. 

We can do the same for the free multiplicative 
convolution. In this case, we rather have to define the 
so-called T (or sometimes 77 [3] ) transform given by 

7m(z) = J = zpm(z) - 1 , (A. 6 ) 


which can be seen as the moment generating function 
of M. Then, the 5-transform of M is then defined as 

z-f 1 


5m(z) := 


zTmHz) 


(A.7) 


where 7 )(j^^(z) is the functional inverse of the T- 
transform. Before showing why the 5-transform is 
important in RMT, one has to be careful about the 
notion of product of free matrices. Indeed, if we recon¬ 
sider the two N X N independent symmetric random 
matrices A and B, the product AB is in general not 
self-adjoint even if A and B are self-adjoint. However, 
if A is positive definite, then the product -^ABy^ 
makes sense and share the same moments than the 
product AB. We can thus study the spectral measure 
of M = -\/ABVa in order to get the distribution of 
the free multiplicative convolution pakib- The result, 
first obtained in [ 20 ], reads: 


5akib(z) := 5m(z) = 5a(z)5b(z). (A. 8 ) 


Gm(A, - i77) = RV. J + mpuiX) 


The 5-transform is therefore the analogue of the 
Fourier transform for free multiplicative convolution. 
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Appendix B 

Derivation oe the global law estimate 
A. The replica method 

Throughout the following, we shall use the abbre¬ 
viation G(z) = Gm(^) for simplicity. The starting 
point of our approach is to rewrite the entries of 
the resolvent G(2) := {Gij{z)) € using the 

Gaussian integral representation of an inverse matrix: 


G.,{z) = 

/ VtVj exp I i Em=i r]k{.z5k^i - Mu)m 1 11^=1 d?7fc 

/exp {-5 Em=i r]k{z5ki - JlLi dr/fe 

(B.l) 


We recall that the claim is that for a complex 0 not 
too close to the real axis, we expect the resolvent to 
be self-averaging in the large N limit, that is to say 
independent of the specific realization of the matrix 
itself. Therefore we can study the resolvent Gm(z) 
through its ensemble average (denoted by (•) in the 
following) given by: 


{Gij{z)) = 

1 

'Z 


r 

' 1 ^ 

/ rjiTjj exp 

“2 X] hk{zSki - Mki)r]i 

k,l = l 


n d^fe 


(B.2) 


where Z is the partition function, i.e. the denominator 
in Eq. (B.l). The computation of the average value 
is highly non trivial in the general case. The replica 
method tells us that the expectation value can be 
handled thanks to the following identity 


{Gij(z)) 


lim ( Z^- 
n —>-0 


^ J (Y[ X 

1 ^ 

-- rikiz&k,! - Mki)'ni 
^ k,l=l 

/“o / f n n d’lfe) nWj X 

\k = la = l / 


(B.3) 


We have thus transformed our problem to the compu¬ 
tation of n replicas of the initial system (B.l). So when 
we have computed the average value in (B.2), it suf¬ 
fices to perform an analytical continuation of the result 
to real values of n and finally takes the limit n —> 0. 
The main concern of this non-rigorous approach is that 
we assume that the analytical continuation can be done 
with only n different set of points which could lead 


to uncontrolled approximation in some cases [50]. We 
stress that the replica trick (B.3) allows to investigate 
each entry TV} of the resolvent Gm- 

This result we get is thus stronger than classical tools 
from free probability theory which concerns only the 
Stieltjes transform, i.e. the normalized trace of the 
resolvent. 


B. Free additive noise 

We consider a model of the form 


M = C + OBO^ 


where B is a fixed matrix with eigenvalues 61 > &2 > 
■ ■ ■ > hjsi with spectral and O is a random matrix 
chosen in the Orthogonal group 0{N) according to 
the Haar measure. Clearly, the noise term is invariant 
under rotation so that we expect the resolvent of M to 
be in the same basis than C. We therefore set without 
loss of generality that C is diagonal. Throughout the 
following, we fix j € {1,..., iV} with possibly j 7^ 
i. In order to derive the global law estimate for the 
resolvent of the matrix M, we have to consider the 
ensemble average value of the resolvent over the Haar 
measure for the 0{N) group, which can be written as 
follow 


{Gij{z)) = (n n n 






(B.4) 


The evaluation of the later equation can be done 
straightforwardly if we set the measure dO to be a 
flat measure constrained to the fact that OO^ = Itv, or 
equivalently said: 


VO (X J/ dOij nME OikOjk 






where S{-) is the Dirac delta function and 6ij is 
Kronecker delta. In the case where n is finite (and 
independent of N), one can notice that Eq. (B.4) is the 
Orthogonal low-rank version of the Harish-Chandra- 
Itzykson-Zuber integrals ([44], [45]). The result is 
known for all symmetry groups ([51], [52] or [53] for 
a more rigorous derivation), and reads for the rank-n 
case 


220 exp 


Tr 




as. t 


OBO’ 


= exp 


N 

y 


a. = l 



(B.5) 

(B.6) 


with Wb the primitive of the 7^-transform of B. We 
emphasize that (B.5) can be obtained using a simple 
saddle-point calculation valid when n is finite, in the 
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limit TV —> oo and its explicit structure simplifies a 
lot the following computations compared to standard 
approaches that uses Weingarten calculus (see e.g. 
[54]). 

By plugging (B.5) into (B.4), we see that we need 
to compute: 



which gives the following relation for the Stieltjes transform: 
0 m(«) = 0 c(2 — 'TIb{p*)) = p*, which is equivalent to: 

P* = 0 m(2 ) = 0 c (« - 7 ?.b (0m(2 ))) , (B.IO) 

and therefore 

C* =7^B(0M(^)). (B.ll) 

In conclusion, by plugging (B.IO) and (B.ll) into (B.7) 
and then taking the limit n —>■ 0 , we obtain the global law 
estimate 

(Gijiz)) = {Z{z)In - C)-lSij (B. 12 ) 

with 

Z(z) = z - Tin {gu{z)), (B.13) 


where we introduced a Lagrange multiplier p“ = which gives exactly the matrix result stated in Eq. (III.2) and 

which gives using Fourier transform (re- (ni, 3 ) since the indexes i and j are arbitrary, 

naming = —2iQ°‘/N) 


(G.,(.)>oc III (n n/<) ^ 


N 


E [>VB(p“)-p“r]| X 

C n IV -> 

vlvj exp ^ - r - Cfe) . 

I = = l J 

This additional constraint allows to retrieve a Gaussian 
integral over the {rjj} which can be computed exactly. 
Ignoring normalization terms, we obtain 

{Gij{z)) ^11 


z + C^ - a 


exp{-^Fb(p“,C“)} dp“dC“j (B.7) 


where the ‘free energy’ Fq is given by 


1 " 


y]iog(»-c"- 


Ck) 


a=l 

- Wb(p“)+p“C° 


(B. 8 ) 


C. Free multiplicative noise 

Let us set the measurement matrix M as: 

M = VCOBO^VC 


(B.14) 


where O is still a rotation matrix over the Orthogonal group, 
C is a positive definite matrix and B is such that TrB 7 ^ 0. 
Note that we can assume without loss of generality that C 
is diagonal because the argument of the previous subsec¬ 
tion still applies. As in the previous section, we may fix 
i,j € {1,...,A^} throughout the following without loss of 
generality as the generalization to the matrix relation (III. 8 ) 
will be immediate. Using the abbreviation 6 ( 2 ) = Gm( 2 ), 
the replica identity (B.3) allows us to write the entries of the 
resolvent of M as follow 


\a^l k = l / 

^g 5 E”i = i ES = i (VCOBotVC)fci77“^ ^ (B.15) 
We can notice that the matrix 5I[a=i 

is a symmetric rank-n matrix, with n finite and independent 
of N. Therefore, the expectation over the Haar measure still 
leads to a rank-n Orthogonal version of HCIZ integral, and 
the result reads [53]: 


In the large N limit, the integral can be evaluated by 
considering the saddle-point of the free energy Fq as the 
other term is obviously sub-leading. We now use the replica 
symmetric ansatz that tells us if the free energy is invariant 
under the action of the symmetry group 0{N), then we 
expect a saddle-point which is also invariant. This implies 
that we have at the saddle-point 

p“=p and C=C, Va G {1,... ,n}, (B.9) 

from which we find the following solution: 

fc* =7^B(p*) 

[p* = 0 c{z - C) • 

The trick is to see that we can get rid of one variable by 
taking the normalized trace of the (average) resolvent (B.7) 



^ N n 

9 1] ^p^^/coBot^/c)fc,77^ 


exp 


N 


n / N 

E»'.(2EW) 


(B.16) 


As in the free addition case, let us defined the auxiliary 
variable p“ = ;^ that we enforce by a Dirac 

delta function. By following the same step as in the previous 
section, we get a Gaussian integral over the {p^} that we 
compute to eventually find that 


exp |-^Tb(p“,C“)| 

^ ^ a = l 


(B.17) 
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where the free energy is given by 






N 


C/c j 


+ C>“-Wb(p“) 


(B.18) 


We now assume that the saddle-point solution can be com¬ 
puted using the replica symmetry ansatz (B.9), so that the 
free energy becomes 

1 

■Fo(p“,C“) = Fo{p,C) = -TT^log{z-Cck)+Cp-WBip). 


(B.19) 


We first consider the derivative w.r.t. p which leads to 

C* = 7^Bb). (B.20) 

The other derivative gives 

Tc 


p = 




(B.21) 


7?.b(J3*) 

Hence, we see that the resolvent is given in the large N limit 
and the limit n —>■ 0 by 

6 i. 




(B.22) 


It now remains to determine p* and to that end, we can find 
a genuine simplification in this last expression by using the 
connexion with the free multiplication convolution. By taking 
the normalized trace in the latter equation, it yields 

z 


zqm{z) = Zqc{Z), with ^ ^ 
which can be easily rewrite using (A.6) as 
Tm{z)=Tc{Z). 

Next, let us define 


(B.23) 


uj = Tm{z) = Tc{Z) (B.24) 

which implies from (B.21) that p* — uj/TZ-b{p*)- Then, we 
may rewrite the Eq. (B.24) as 

zTu{z) = ZTc{Z)TZk{p*) . 

It is trivial to see that plugging uj into this latter expression 
yields ujT^^ipj) — {uj)TZb{p’*), and by the definition 

(A.7) of the <S-transform, we have 

In order to retrieve the desired result, we use the following 
relation 

(p* (p*)), (B .26) 

that follows from the very definition of the <S-transform of 
B. But recalling that p* = lo /TZb.(p*), we conclude from 
(B.20), (B.24) and (B.26) that 

^ = 71b(p*) = Sb{Tm{z)). (B.27) 

c 


Then going back to (B.25), we see that the spectral density 
of M is given by the free multiplication 

<SM(at) = Sc{^)Sb(i-F), (B.28) 

as expected and it proves that the replica symmetry ansatz 
holds in this model. Finally, by plugging (B.27) into (B.23), 
we conclude that we can characterize the asymptotic global 
law of the resolvent of M by a deterministic quantity: 

{G.Az)) = - - H -• (B.29) 

^ SbCTmIz)) 

All in all, the global law of the resolvent of the free 
multiplication M = is given by 

z{G.Az)) = 5,,,Z{z){Z{z)-c,)-\ 

Z{z) ■■= zSB{zgu{z) - 1) (B.30) 

which is exactly the result stated in Eq. (III.8) since the 
indexes i, j were arbitrary throughout the computations. 


D. Information-Plus-Noise matrix 

The computation of the global law estimate for this model 
is pretty similar to the sample covariance [26]. The noisy 
measurement matrix is given by 

M= i(A + aX)(A-f crX)^' 

with A a fixed NxT matrix such that T~^AA^ = C. As we 
posit that X is a Gaussian matrix, we can work in the basis 
where C is once again diagonal. Moreover, we can in fact 
show that interchanging the integral and the average leads 
to the same result. We hence consider directly the annealed 
average and abbreviate G(2 ;) = Gm]^) in order to lighten 
the notations. Let us compute the entries of G using (B.3) 
with n = 1 and we find 



where we have defined Yt := At -f aXt £ which is 
still a Gaussian vector. One can readily compute the average 
value over the measure of Y to find 



1 

2 f 








t 


1 

) E 



where we omitted all constants terms and used Sherman- 
Morrison formula in the exponential term. Rewriting p = 
a^T~^p^ri that we enforced by a Dirac delta function, we 
can therefore compute the integral over {pk} to find 


{Gij{z)} = dp dC 


z-qCa^-T^ 


■ exp 


N 

(B.31) 
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and we can once again compute the integral in the large N 
limit hy performing the saddle-point of the following free 
energy 

Fo{p,0 ■■= logil - p) - p( (B.32) 

1 ^ 

+ ;^5Zlog[(2-'7cr^C)(l-p)-Cfe] • 

fc=i 

The derivation over gives the following equation p* = 
q(j^{l — p*)gc {{z — — p*)), and hy taking the 

normalized trace of the resolvent, we see that we have 

p* = qa'^gmiz). (B.33) 

The other derivative leads to 

C =- f zgmiz). (B.34) 

q 

Therefore, by plugging and p* into (B.31) and then by 
sending N ^ oo leads to: 

{G.,{z)) = {{zZiz) - a^{l - q)) - Z{z)-^C)-^ , 

(B.35) 

with 

Z{z) = I - qa^guiz). (B.36) 

This is exactly the result announced in Eqs. (IIL23) and 
(IIL24) or in [36], showing that considering directly the 
annealed average is indeed correct in the limit N ^ oo. 

Appendix C 

Abbreviations and Symboes 

Symbol Definition 

RMT Random Matrix Theory 

ESD Empirical Spectral Density 

LSD Limiting Spectral Density 

RI Rotational Invariance 

RIE Rotational Invariant Estimator 

SNR Signal to Noise Ratio 

G]vi(^) Resolvent of M, see Eq. (A.l) 

Pm{z) LSD of M 

Stieltjes transform of pM, see Eq. (A.2) 

Functional inverse of see Eq. (A.3) 

7^-transform of M, see Eq. (A.4) 

7m ( 2 ) Moment generating function of pM, see Eq. (A.6 ) 
cSm(-2 ^) cS-transform of M, see Eq. (A.7) 



